Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Ground beetles are a highly sensitive and speciose biolog- ical indicator, making them vital for monitoring biodiver- sity. However, they are currently an underutilized resource due to the manual effort required by taxonomic experts to perform challenging species differentiations based on sub- tle morphological differences, precluding widespread ap- plications. In this paper, we evaluate 12 vision models on taxonomic classification across four diverse, long-tailed datasets spanning over 230 genera and 1769 species, with images ranging from controlled laboratory settings to chal- lenging field-collected (in-situ) photographs. We further ex- plore taxonomic classification in two important real-world contexts: sample efficiency and domain adaptation. Our re- sults show that the Vision and Language Transformer com- bined with an MLP head is the best performing model, with 97% accuracy at genus and 94% at species level. Sample efficiency analysis shows that we can reduce train data re- quirements by up to 50% with minimal compromise in per- formance. The domain adaptation experiments reveal sig- nificant challenges when transferring models from lab to in-situ images, highlighting a critical domain gap. Overall, our study lays a foundation for large-scale automated tax- onomic classification of beetles, and beyond that, advances sample-efficient learning and cross-domain adaptation for diverse long-tailed ecological datasets.more » « lessFree, publicly-accessible full text available July 18, 2026
-
In this paper, we extend the dataset statistics, model benchmarks, and performance analysis for the recently published KABR dataset, an in situ dataset for ungulate behavior recognition using aerial footage from the Mpala Research Centre in Kenya. The dataset comprises video footage of reticulated giraffes (lat. Giraffa reticulata), Plains zebras (lat. Equus quagga), and Grévy’s zebras (lat. Equus grevyi) captured using a DJI Mavic 2S drone. It includes both spatiotemporal (i.e., mini-scenes) and behavior annotations provided by an expert behavioral ecologist. In total, KABR has more than 10 hours of annotated video. We extend the previous work in four key areas by: (i) providing comprehensive dataset statistics to reveal new insights into the data distribution across behavior classes and species; (ii) extending the set of existing benchmark models to include a new state-of-the-art transformer; (iii) investigating weight initialization strategies and exploring whether pretraining on human action recognition datasets is transferable to in situ animal behavior recognition directly (i.e., zero-shot) or as initialization for end-to-end model training; and (iv) performing a detailed statistical analysis of the performance of these models across species, behavior, and formally defined segments of the long-tailed distribution. The KABR dataset addresses the limitations of previous datasets sourced from controlled environments, offering a more authentic representation of natural animal behaviors. This work marks a significant advancement in the automatic analysis of wildlife behavior, leveraging drone technology to overcome traditional observational challenges and enabling a more nuanced understanding of animal interactions in their natural habitats. The dataset is available at https://kabrdata.xyzmore » « less
-
We present a novel dataset for animal behavior recognition collected in-situ using video from drones flown over the Mpala Research Centre in Kenya. Videos from DJI Mavic 2S drones flown in January 2023 were acquired at 5.4K resolution in accordance with IACUC protocols, and processed to detect and track each animal in the frames. An image subregion centered on each animal was extracted and combined in sequence to form a “mini-scene”. Be-haviors were then manually labeled for each frame of each mini-scene by a team of annotators overseen by an expert behavioral ecologist. The resulting labeled mini-scenes form our resulting behavior dataset, consisting of more than 10 hours of annotated videos of reticulated gi-raffes, plains zebras, and Grevy's zebras, and encompassing seven types of animal behavior and an additional category for occlusions. Benchmark results for state-of-the-art behavioral recognition architectures show labeling accu-racy of 61.9% for macro-average (per class), and 86.7% for micro-average (per instance). Our dataset complements recent larger, more diverse animal behavior sets and smaller, more specialized ones by being collected in-situ and from drones, both important considerations for the future of an-imal behavior research. The dataset can be accessed at https://dirtmaxim.github.io/kabr.more » « less
An official website of the United States government

Full Text Available